Let us use various data science tools in R (tidyverse, tidycensus) to explore the demographics of the Santa Clara County. This dynamic document was produced by Quarto, which supports R, Python, and Julia.
There are 408 census tracts in Santa Clara County. On average each census tract contains 4000 residents. These define our communities and we want to visualize health, income, education, housing, and health disparities in these communities.
Using the PL 94-171 Redistricting Data Summary File
Note: 2020 decennial Census data use differential privacy, a technique that
introduces errors into data to preserve respondent confidentiality.
ℹ Small counts should be interpreted with caution.
ℹ See https://www.census.gov/library/fact-sheets/2021/protecting-the-confidentiality-of-the-2020-census-redistricting-data.html for additional guidance.
This message is displayed once per session.
Code
# National population size from 2020 Decennial Census.uspop20 <-get_decennial(geography ="state", variables ="P2_001N",year =2020)
Getting data from the 2020 decennial Census
Using the PL 94-171 Redistricting Data Summary File
Santa Clara County has 1936259 residents, about 1% of the US population (2020 Decennial Census).
2.1 Age and sex
Code
# 2021 1-year ACS data for SC Countyscc_acs1_2021 <-get_acs(geography ="county",county =c("Santa Clara"),state ="CA",year =2021,variables =c(med_age ="B01002_001"), survey ="acs1")
Getting data from the 2021 1-year ACS
The 1-year ACS provides data for geographies with populations of 65,000 and greater.
The median age is 38.2 in Santa Clara County (2021 ACS).
Population by Age and Sex (Pyramid Plot):
Code
# ingestscc_pyramid <-get_estimates(geography ="county",county =c("Santa Clara"),state ="CA",product ="characteristics",breakdown =c("SEX", "AGEGROUP"),breakdown_labels =TRUE,year =2019) %>%# wranglefilter(str_detect(AGEGROUP, "^Age"), SEX !="Both sexes" ) %>%mutate(value =ifelse(SEX =="Male", -value, value)) %>%# visualizeggplot(aes(x = value, y = AGEGROUP, fill = SEX)) +geom_col(width =0.95, alpha =0.75) +theme_minimal(base_family ="Verdana", base_size =12) +scale_x_continuous(labels =~number_format(scale = .001, suffix ="k")(abs(.x)),limits =1000000*c(-0.1, 0.1) ) +scale_y_discrete(labels =~str_remove_all(.x, "Age\\s|\\syears")) +scale_fill_manual(values =c("darkred", "navy")) +labs(x ="",y ="2019 ACS estimate",title ="Population structure in Santa Clara County",fill ="",caption ="Data source: US Census Bureau population estimates")ggplotly(scc_pyramid)
[v3->v4] `tm_polygons()`: instead of `style = "quantile"`, use fill.scale =
`tm_scale_intervals()`.
ℹ Migrate the argument(s) 'style', 'n', 'palette' (rename to 'values') to
'tm_scale_intervals(<HERE>)'
[v3->v4] `tm_polygons()`: use 'fill' for the fill color of polygons/symbols
(instead of 'col'), and 'col' for the outlines (instead of 'border.col').
[v3->v4] `tm_polygons()`: migrate the argument(s) related to the legend of the
visual variable `fill` namely 'title' to 'fill.legend = tm_legend(<HERE>)'
[cols4all] color palettes: use palettes from the R package cols4all. Run
`cols4all::c4a_gui()` to explore them. The old palette name "Blues" is named
"brewer.blues"
Multiple palettes called "blues" found: "brewer.blues", "matplotlib.blues". The first one, "brewer.blues", is returned.
The following table tallies the segregation indices \(H\) in major urban areas in California with population > 750,000. Higher \(H\) indicates more segregation.
Code
library(segregation)# Get California tract data by race/ethnicityca_acs_data <-get_acs(geography ="tract",variables =c(white ="B03002_003",black ="B03002_004",asian ="B03002_006",hispanic ="B03002_012" ), state ="CA",geometry =TRUE,year =2019)
Getting data from the 2015-2019 5-year ACS
Code
# Use tidycensus to get urbanized areas by population with geometry, # then filter for those that have populations of 750,000 or moreus_urban_areas <-get_acs(geography ="urban area",variables ="B01001_001",geometry =TRUE,year =2019,survey ="acs1") %>%filter(estimate >=750000) %>%transmute(urban_name =str_remove(NAME, fixed(", CA Urbanized Area (2010)")))
Getting data from the 2019 1-year ACS
The 1-year ACS provides data for geographies with populations of 65,000 and greater.
Code
# Compute an inner spatial join between the California tracts and the # urbanized areas, returning tracts in the largest California urban # areas with the urban_name column appendedca_urban_data <- ca_acs_data %>%st_join(us_urban_areas, left =FALSE) %>%select(-NAME) %>%st_drop_geometry()mutual_within(data = ca_urban_data,group ="variable",unit ="GEOID",weight ="estimate",within ="urban_name",wide =TRUE) %>%select(urban_name, H) %>%arrange(desc(H))
Key: <urban_name>
urban_name H
<char> <num>
1: Los Angeles--Long Beach--Anaheim 0.2851662
2: San Francisco--Oakland 0.2116127
3: San Diego 0.2025728
4: San Jose 0.1829190
5: Sacramento 0.1426804
6: Riverside--San Bernardino 0.1408461
2.3.1 Local segregation analysis
Patterns of segregation across the San Francisco–Oakland urban area:
53.7% of Santa Clara County residents speak a language other than English at home (2016-2021 ACS).
Code
scc_lang_plot <- scc_lang %>%ggplot(aes(x =fct_rev(fct_reorder(variable, percent)), y = percent)) +geom_col(color ="navy", fill ="navy", alpha =0.5, width =0.4) +scale_y_continuous(labels =label_percent(scale =100)) +labs(title ="Languages spoken at home in Santa Clara County",subtitle ="2016-2021 ACS, population 5 years and over",x ="Language",y ="Percent" )ggplotly(scc_lang_plot)